ROCm e HIP: Um Tutorial Detalhado de 10 Capítulos: A Rota Paralela: Mapeando Lógica Sequencial para Fios de GPU

A Rota Paralela representa a mudança fundamental na filosofia computacional de um sequência temporal (fazendo uma coisa após a outra) para um distribuição espacial (fazendo tudo ao mesmo tempo em uma grade).

1. O Heurística da Independência

Esta é a regra de ouro da computação em GPU: “Sempre que o seu problema for ‘aplicar algo independentemente a N elementos’, esta é a primeira abordagem a tentar.” Esta abordagem paralela por dados é o fruto fácil da aceleração em GPU, onde a sobrecarga de gerenciamento de fios é insignificante diante do alto throughput simultâneo.

2. Precisão e Carga

Os kernels HIP geralmente lidam com grandes matrizes de tipos primitivos. Em gráficos de alto desempenho e aprendizado de máquina, usamos frequentemente float (precisão simples), enquanto simulações científicas que exigem estabilidade numérica extrema utilizam double (precisão dupla).

3. Da Iteração à Ocupação

No código da CPU, o processador "visita" os dados por meio de laços. Na lógica da GPU, os dados "ocupam" um fio. Você deixa de escrever como fazer o laço e começa a escrever o que um único trabalhador deve fazer em uma coordenada específica.

$$\text{Índice } i = \text{blockIdx.x} \times \text{blockDim.x} + \text{threadIdx.x}$$

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the primary heuristic for deciding if a problem is suitable for the 'Parallel Pivot'?

The problem requires complex recursion.

The problem involves applying an operation independently to N elements.

The problem must be solved in a strict temporal order.

The problem uses only integer arithmetic.

QUESTION 2

In the context of the Parallel Pivot, what does the term 'Occupation' refer to?

The CPU visiting each index in a for-loop.

How many blocks are currently queued in the GPU.

Data 'occupying' a specific thread at a specific coordinate.

The percentage of memory used by the float arrays.

QUESTION 3

Which data types are most commonly handled by HIP kernels for high numerical stability in science?

bool and char

int and long

float and double

void and pointer

QUESTION 4

When pivoting a loop into a kernel, what replaces the loop counter `i`?

The return value of the function.

A global thread identity calculated from grid/block dimensions.

The hipMalloc address.

The host-side iteration variable.

QUESTION 5

Fill in the blank: To ensure production reliability even in basic kernels, you must ______.

Only use float types.

Add explicit error-checking macros everywhere.

Use a single thread per block.

Avoid all boundary checks.